Dynamic performance data collection in large computer servers

ABSTRACT

A mechanism is provided for collecting one or more performance metrics. A plurality of source code instructions is provided. The source code instructions include a plurality of macro calls. Each of the plurality of macro calls includes a plurality of predetermined parameters. A plurality of object code instructions corresponding to the plurality of source code instructions is executed. In response to receiving a signal identifying at least one of the plurality of macro calls contained in the source code instructions and identifying a desired level of granularity, performance metrics are collected using the identified macro call in accordance with the desired level of granularity.

BACKGROUND

The disclosure relates generally to performance monitoring, and morespecifically to dynamic performance data collection in large computerservers.

An IBM System z® computer is a product line of large computer servers ormainframes based on the z/Architecture® provided by InternationalBusiness Machines Corporation (IBM®) of Armonk, N.Y. IBM System z®computers can utilize a derivative of the Multiple Virtual Storage (MVS)operating system, which is a robust mainframe operating system utilizedby many generations of IBM® mainframe computers. Derivatives of the MVS™operating system can include the OS/390® operating system and IBM z/OS®(IBM System z®, z/Architecture®, IBM®, OS/390® and IBM z/OS® areregistered trademarks of International Business Machines Corporation,located in Armonk, N.Y.).

IBM System z® computers typically run mainframe applications based onthe programming languages designed for these environments, such asCOBOL, PL/I and Assembler. These mainframe applications typically handlehigh volumes of data and/or high transaction rates within complexsystems and user environments. Therefore, these applications are usuallycritical to the business in which the IBM System z® computer isinstalled.

Both execution time and cost are factors that must be considered whenrunning mainframe applications. For example, execution time may becritical with respect to the performance of batch systems and withrespect to particular response times required by certain applications(such as web-based or other on-line systems). However, increasing datavolumes due to higher business complexity can cause batch process toexceed desired time limits and can increase the response times ofcritical web-based applications to unacceptable levels. Furthermore,mainframe systems are often associated with high operating costs sincemany businesses pay for mainframes on a usage basis (for example,license costs are often coupled to the number of MIPS (“MillionInstructions Per Second”) in a mainframe installation) and also may payfixed license costs for mainframe software. Given the increasingpressure to reduce IT spending, these costs have become a major problemto many businesses using mainframe computers.

Given the increasing execution time requirements and cost pressures,businesses are forced to evaluate their current mainframe installations.One option available to such businesses is to upgrade the computerhardware. However, this creates additional fixed costs and is typicallyonly a good choice if money is not a primary decision-driving factor orif the company needs to react immediately. Another option is to optimizeand tune the system environment and the applications running in theenvironment. Utilities have been developed to gather specificperformance related data in a mainframe environment, but to date theseprograms have been largely focused on measuring performance at amainframe job level. These utilities do not provide a flexible way tocapture data at more granular levels, such as, for example, a singleline of assembler instruction.

SUMMARY

In one aspect, a method for collecting one or more performance metricsis provided. The method comprises providing a plurality of source codeinstructions having a plurality of macro calls among the plurality ofsource code instructions. Each of the plurality of macro calls includesa plurality of predetermined parameters. The method further comprisesexecuting a plurality of object code instructions corresponding to theplurality of source code instructions. The method further comprises, inresponse to receiving a signal identifying at least one of the pluralityof macro calls and identifying a desired level of granularity,collecting the one or more performance metrics using the identified atleast one of the plurality of macro calls in accordance with the desiredlevel of granularity.

In another aspect, a computer program product for collecting one or moreperformance metrics is provided. The computer program product comprisesone or more computer-readable tangible storage devices and a pluralityof program instructions stored on at least one of the one or morecomputer-readable tangible storage devices. The plurality of programinstructions comprises program instructions to provide a plurality ofsource code instructions having a plurality of macro calls among theplurality of source code instructions. Each of the plurality of macrocalls includes a plurality of predetermined parameters. The plurality ofprogram instructions further comprises program instructions to execute aplurality of object code instructions corresponding to the plurality ofsource code instructions. The plurality of program instructions furthercomprises program instructions to, in response to receiving a signalidentifying at least one of the plurality of macro calls and identifyinga desired level of granularity, collect the one or more performancemetrics using the identified at least one of the plurality of macrocalls in accordance with the desired level of granularity.

In yet another aspect, a computer system for collecting one or moreperformance metrics is provided. The computer system comprises one ormore processors, one or more computer-readable tangible storage devices,and a plurality of program instructions stored on at least one of theone or more storage devices for execution by at least one of the one ormore processors. The plurality of program instructions comprises programinstructions to provide a plurality of source code instructions having aplurality of macro calls among the plurality of source codeinstructions. Each of the plurality of macro calls includes a pluralityof predetermined parameters. The plurality of program instructionsfurther comprises program instructions to execute a plurality of objectcode instructions corresponding to the plurality of source codeinstructions. The plurality of program instructions further comprisesprogram instructions to, in response to receiving a signal identifyingat least one of the plurality of macro calls and identifying a desiredlevel of granularity, collect the one or more performance metrics usingthe identified at least one of the plurality of macro calls inaccordance with the desired level of granularity.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 is a block diagram of a logically partitioned computer system inwhich an embodiment of the present invention can be implemented.

FIG. 2 is a block diagram representation of language runtimeenvironments illustrated in FIG. 1 in accordance with an embodiment ofthe present invention.

FIG. 3 is a block diagram illustrating an assembler language programhaving a macro definition in accordance with an embodiment of thepresent invention.

FIG. 4A depicts a portion of an exemplary C application program having aplurality of macro calls in accordance with an embodiment of the presentinvention.

FIG. 4B depicts a portion of an exemplary PL/I application programhaving a plurality of macro calls in accordance with an embodiment ofthe present invention.

FIG. 5 is a flowchart of a method for collecting one or more performancemetrics performed by a language runtime environment executing anapplication program having a plurality of macro calls in accordance withan illustrative embodiment.

DETAILED DESCRIPTION

Embodiments of the present invention will now be described withreference to the figures. Embodiments of the present invention applyequally to all forms of large computer servers including mainframes.However, focus is directed to IBM System z® computers by means ofexample and explanation in the description of embodiments of the presentinvention.

The method of collecting performance metrics described herein providesflexibility, by using different levels of granularity, for capturingperformance metrics, such as elapsed execution time described furtherbelow, for a variety of computer programs. As used herein, the term“level of granularity” refers generally to a value indicating a level ofimportance of a particular location in the source code in which a macrocall has been inserted. It is noted that the term “macro call”, as usedherein, refers to a single programming statement that is replaced, atcompile time, by a plurality of programming instructions defined in acorresponding macro definition. In accordance with an embodiment of thepresent invention, performance metrics may be collected at differentlevels of detail without recompilation of the computer programsexecuting in a production environment. A plurality of macros, which maybe called on to expand into operable code, may be inserted in the sourcecode at various locations. The programmer may dynamically select ordeselect a source code segment within an application program for whichperformance metrics may be collected. This may be done from the commandline while the application program is running by identifying macrosassociated with starting and ending points within the given source codesegment. Advantageously, each macro may be assigned a granularity level,which enables the programmer to control the granularity of collecteddata. In various embodiments, performance metrics may include elapsedexecution time, CPU time, or the like.

The elapsed execution time (as opposed to CPU time) for executing aportion of code can be measured by retrieving the hardware clock timervalues just before and just after executing the code fragment. Theexecution time elapsed between two points of references in the code maybe calculated as the difference between the two timer values, and may bereferred to as elapsed time. In other words, the elapsed execution timeis the difference in the hardware clock timer values retrieved at thebeginning and at the end of the given fragment of code.

The elapsed execution time cannot always be used to accurately measurethe consumption of CPU time by an application program. This is becausethe application program in question can become idle as a result of theoperating system action. Therefore, to measure the CPU time spent duringexecution of a fragment of code using the elapsed time, the correctresults would be obtained only if the execution of the applicationprogram was not preempted during the measurement period. Accordingly,measurement of the CPU time may be more appropriate metric than theelapsed execution time in some situations.

FIG. 1 is a conceptual block diagram of a logically partitioned computersystem generally designated 100 in which an embodiment of the presentinvention can be implemented. FIG. 1 is an illustration of oneimplementation and is not intended to imply any limitation with regardto the environments in which different embodiments may be implemented.Many modifications to the depicted environments may be made.

In one embodiment logically partitioned computer system 100 can be aphysical computer 110 such as an IBM® System z® mainframe computer(offered by International Business Machines Corporation, Armonk, N.Y.)although the present embodiment can be implemented in other servercomputers or personal computers as well. Logically partitioned computersystem 100 may include multiple logical partitions 130, 140, 150. Eachlogical partition 130, 140, 150 may be capable of functioning as aseparate system. That is, each logical partition can be independentlyreset, initially loaded with an operating system, if desired, andoperate with different programs. In this particular example, eachlogical partition 130, 140, 150 may include an operating system 132,142, and 152, respectively which may provide standard operating systemfunctions such as I/O, communication, etc. to its applications. Itshould be noted that resident operating systems running in the variouslogical partitions may differ. In one embodiment, operating system 132is the IBM z/OS® operating system, which is offered by InternationalBusiness Machines Corporation, Armonk, N.Y., while the other operatingsystems 142 and 152 may be, for example, but not limited to, Linuxoperating system, which is open source software that is readilyavailable on the Internet. Each operating system 132, 142, 152 mayprovide independent programming language runtime environments (LREs)134, 144, and 154, respectively, in which different applications, suchas applications 136, 146, and 156 may run. Each logical partition 130,140, 150 may be capable of concurrently executing a number of differentapplications, such as application programs 136, 146, and 156, as shownin FIG. 1. By way of example, application programs 136, 146, and 156 canbe a C program, PL/I program, Java program, assembler program, and thelike. Base portion 120 participates in the actual logical partitioningof the physical computer 110 and its resources. For example, baseportion 120 may partition the CPU(s), partition memory, partition I/O,etc.

In some cases a user may be interested in optimizing and tuning thesystem environment and the applications running in the environments 134,144, and 154. In accordance with an embodiment of the present invention,application programs 136, 146, and 156 may include a plurality of macrocalls among the plurality of source code instructions constituting therespective application program 136, 146 and 156. This plurality of macrocalls enables the user to dynamically adjust the collection ofperformance metrics without recompiling application programs 136, 146,and 156 as discussed below with reference to FIGS. 4 a, 4 b, and 5.Physical computer 110 may also include memory area 160 which may beshared by all of the logical partitions 130, 140, 150, etc. Therefore,each logical partition 130, 140, 150 can directly address and access theshared memory area 160 to read data therefrom or write data thereto. Inan embodiment, shared memory area 160 may include a macro definitionlibrary 162. Macro definition library 162 may store a plurality of macrodefinitions 164 that define the plurality of macros included in theapplication programs 136, 146, and 156. The respective logical partitionmay access the macro library 162 on behalf of the correspondingapplication program 136, 146 and 156.

FIG. 2 is a block diagram illustrating an exemplary programming LREillustrated in FIG. 1 in accordance with an embodiment of the presentinvention. Embodiments of the present invention apply equally to allLREs 134, 144, 154. However, focus is directed to LRE 144 by means ofexample and explanation in the description of embodiments of the presentinvention. In software programming, a programmer writes code in atextual form (“source code”). This code is typically translated (by aprogram called a compiler, which is included in LRE 144) into anotherform (for example, “object code” contained in object files) which can beexecuted directly by a computer, such as physical computer 110. In otherwords, source code is typically human-readable but cannot be executeddirectly. Object code is not human readable but typically can beexecuted by a computer. An application typically is a collection of oneor more programs (referred to herein as application programs 136, 146,156) cooperating to achieve particular objectives, such as inventorycontrol or payroll. LRE 144 illustrated in FIG. 2 provides a frameworkwithin which an application runs. For example, during the creation of anexecutable version of the application it may be normal practice tocombine a plurality of files (run units) into a single larger executablefile, for example, when two or more object files are combined into adynamic link library, or two or more class files are combined into asingle jar file in the Java® (Java is a registered trademark of Oracle®Corporation located in Redwood Shores, Calif.) programming environment.Such a combination process is known as link-editing (or deployment inthe Java® case). Thus, LRE 144 may include both compiler and linkerprograms.

The language-specific portions of LRE 144 may provide languageinterfaces and specific services that are supported for each individuallanguage, and that can be called through a common callable interface.LRE 144 may include, for example, but not limited to, the followinginterfaces: Cobol interface 202, Fortran interface 204, PL/I interface206, C/C++ interface 208, Java interface 209, and Assembler interface210. Each of the interfaces 202, 204, 206, 208, 209, and 210 may includelanguage specific libraries. In addition, LRE 144 may include essentialruntime services interface 212. Essential runtime services interface 212may include common library services, such as math or date and timeservices, that are commonly needed by programs running on the system.Essential runtime services interface 212 may also include basic routinesthat support starting and stopping programs, allocating storage,communicating with programs written in different languages, andindicating and handling conditions.

An embodiment of the present invention proceeds with reference to theIBM z/OS® environment. However, other environments provide similarfunctions and operations. This embodiment of the present invention mayimplement the plurality of macro calls as IBM z/OS® assembler programsin which run units are generated by compiling assembler language program(source code) into object decks. An assembler language is considered asecond generation language (2GL) because it is just one step up from thenative language of the hardware, called machine language, which is a setof instructions in the form of combinations of ones and zeros. It shouldbe noted that the act of compiling an assembler program does not produceanything directly executable but instead produces an object deck (rununit) which is source for a linkage editor (also known as a binder). Thelinkage editor may then be used to combine a plurality of object decksinto a single executable file. In an embodiment of the presentinvention, an executable assembler program 302 may contain one or moremacro definitions 164, as described below in conjunction with FIG. 3.

FIG. 3 is a block diagram illustrating an assembler language programhaving a macro definition in accordance with an embodiment of thepresent invention. Assembler interface 210 may include a macro processor(not shown), which may be used, in accordance with an embodiment of theinvention, to receive and process an assembler language program 302 thatcontains at least one macro definition 164. Specifically, in anembodiment, macro definition 164 may contain a reserved word 304 (e.g.“macro_start”) that indicates the start of a macro and another reservedword 306 (e.g. “macro_end”) that indicates the end of the macro. Aswould be apparent to a skilled artisan macro definition 164 may furtherinclude a plurality of statements written in assembler programminglanguage that follow the start 304 and precede the end 306 of the macrodefinition 164. For example, macro definition 164 may include aplurality of assembler language statements which will evaluate macroparameters such as, for example, a “level” and an “ID” parameter. Macrodefinition 164 may further include assembler language statements thatwill trigger collection of the plurality of performance metrics, asdiscussed below in conjunction with FIG. 5.

FIG. 4A depicts a portion of an exemplary C application program 400having a plurality of macro calls in accordance with an embodiment ofthe present invention. The source code of the C application program 400comprises a plurality of instructions or statements, such as, forexample, statement 402, written in “C” language. Each line of Capplication program 400 may be associated with a line number 404. Inthis example, line 1 represents library invocation in C language. Lines4-60 contain main function code, lines 12-19 represent counter functionportion of the code. As illustrated in FIG. 4A, application programmersmay insert a plurality of macro calls 406, 408, 410, 412, shown on lines11, 29, 37, and 51, respectively, among the plurality of source codeinstructions. In an embodiment of the present invention, each macroinvocation (call) 406, 408, 410, 412 may include at least twoparameters, such as, for example, ID and level. In an embodiment of thepresent invention, the ID parameter of the macro call may enableapplication programmers to identify a particular location in a sourcecode that could be used either as a starting point or ending point forcollecting desired performance metrics. According to an embodiment ofthe present invention, the level parameter included in the macro call,such as 406, 408, 410, and 412 may advantageously indicate the level ofimportance that application programmer may assign to the particularlocation associated with the corresponding macro invocation. Forexample, IDs associated with macro calls 406 and 410 may have values‘12’ and ‘14’, respectively, where macro call 406 may be a startingpoint for collecting performance metrics and macro call 410 may be anending point for collecting performance metrics. The level parametervalue may be equal to ‘99’ for both macro calls 406 and 410. In anembodiment of the present invention, assigned value ‘99’ may representthe highest level of importance, while assigned value ‘1’ may representthe lowest level of importance. Once the plurality of macro invocations406, 408, 410, 412 has been inserted at various locations throughout theapplication program 400, as shown in FIG. 4A, an application programmermay compile the source code of C application program 400. In otherwords, an interface corresponding to “C” programming language 208 (shownin FIG. 2) within the LRE 144 may translate the source code into “objectcode”, which can be executed directly by a computer, such as physicalcomputer 110. It should be noted that a compiler program may accessmacro library 162 to replace each of the plurality of macro invocations406, 408, 410, 412 with, for example, the assembler code contained inthe macro definition 164. Continuing with the foregoing example, onceLRE 144 starts executing object code corresponding to the C applicationprogram 400, an application programmer may dynamically control thecollection of performance metrics, without recompiling the source code,using an operating system command discussed below in conjunction withFIG. 5.

FIG. 4B depicts a portion of an exemplary PL/I application program 450.The source code of the PL/I application program 450 may include aplurality of instructions or statements 452 written in “PL/I” language.Each line of PL/I application program 450 may be associated with a linenumber 454. In this example, lines 2-9 define various functions andparameters. As illustrated in FIG. 4B, application programmers mayinsert a plurality of macro calls 456, 458, 460, 462 among the pluralityof PL/I statements (instructions) 452. In an embodiment of the presentinvention, similarly to a macro call invocation in the “C” languageprogram 400, each macro invocation (call) 456, 458, 460, and 462 in thePL/I application program 450 may include at least two parameters, suchas, for example, an ID parameter and a level parameter. Thus, FIG. 4Billustrates that the plurality of macro calls 456, 458, 460, and 462 maybe invoked in the PL/I program 450 in a similar fashion as the macrocalls in the “C” program.

FIG. 5 is a flowchart of a method for collecting one or more performancemetrics performed by a LRE executing an application program having aplurality of macro calls in accordance with an illustrative embodiment.In an exemplary embodiment, an LRE, such as the LRE 144 included in theLPAR 140, may be executing an application program, such as theapplication program 146 written in, for example, PL/I language andhaving a plurality of macro calls 456, 458, 460, 462 included among aplurality of PL/I statements 452, as shown in FIG. 4B. It should benoted that FIG. 4B illustrates a source code of an exemplary applicationprogram 146. As previously indicated, LRE 144 may include a compilerprogram (not shown) which can translate the source code into an objectcode executable by physical computer 110. At 502, LRE 144 may startexecuting application program 146 according to program instructionsstored in an application program object file containing the object code.In an embodiment of the present invention, LRE 144 may executeapplication program 146 until it receives (decision 504) a signalgenerated by, for example, OS 132 in response to a specific user enteredoperating system command. In an embodiment, a user, such as anapplication programmer, interested in gathering performance relatedmetrics with respect to application program 146 may enter a commandwhich specifies a plurality of predefined parameters. By way of exampleand not limitation, the following command may be utilized:

-   -   set zDPS START (ID1), END (ID2), LOC (JOB1), LEVEL (>5),        DSN(‘SYS1.OUTPUT1’), REPEAT (10)

While the current example uses a “set zDPS” operating system commandname, where zDPS may stand for IBM z/OS® Dynamic Performance Solution,actual command names may differ in various embodiments. In accordancewith an embodiment of the present invention, a value (ID1) of a firstparameter (START) may indicate an identification information of a macrocall corresponding to a location in a source code that should be used asa starting point for collecting desired performance metrics. Forexample, a user may specify the first parameter value to be equal to“0000000012”, indicating that macro call 456 (shown in FIG. 4B)represents a starting point for gathering desired performance metrics. Avalue (ID2) of the second parameter (END) may indicate an ID of a macrocall corresponding to a location in a source code that should be used asa termination point for collecting desired performance metrics. Forexample, the user may specify the second parameter value to be equal to“0000000014”, indicating that macro call 460 (shown in FIG. 4B)represents the point in the source code beyond which the desiredperformance metrics should not be collected. A third parameter (LOC) mayinclude, for example, identification information for identifying aprocess (or job) corresponding to a particular application program 146being executed. In accordance with an embodiment of the presentinvention, one of the predefined parameters, such as fourth parameterLEVEL, may be utilized to control granularity of captured data. In otherwords, this parameter enables one to implement fine-grained performancedata collection. In an embodiment, the level values may range from about1 to about 99, where 99 indicates, for example, the highest importancevalue and 1 indicates the lowest importance. In some embodiments, theLEVEL parameter along with the numeric value may include a comparisonoperator, which may be used by LRE 144 to compare the user specifiednumeric value, with a plurality of predefined level values assigned toeach macro call. The comparison operator can be an equality (“=”), aninclusive inequality (“<=”, “>=”), an exclusive inequality (“<”, “>”),or the like. For illustrative purposes only, assume that the user hasinserted five different macro calls at five different locations in thesource code and assigned levels 1, 10, 35, 55, and 99 to each of thosemacro calls respectively. When the application program runs the user mayuse the LEVEL parameter of the exemplary “set zDPS” command to selectwhich locations in the source code should be included for performancemetrics collection purposes. For example, if the user specifies theLEVEL to be greater than or equal to 30, only the last three locations(associated with the corresponding macro calls) in the source code, willbe included in the collection of performance metrics process.

At least in some embodiments, the exemplary zDPS command may include aparameter (such as a fifth parameter—“DSN” indicative of desired outputoptions. For example, the user may choose to specify a file name (ordata set name) to store collected performance metrics. Yet anotherparameter, for example a sixth parameter REPEAT, may include informationspecifying a desired number of iterations to collect performancemetrics.

Referring back to FIG. 5, in response to the user request to initiateperformance tracing via the exemplary zDPS command, at 506, LRE 144 mayread the plurality of the predefined parameter values, such as START(ID1), END (ID2), LEVEL (>5), etc., obtained from the user. Next, LRE144 may, at 510, continue executing application program 146 (decision508, no branch) until one of the plurality of macro calls 456, 458, 460,462 is reached (decision 508, yes branch). At 512, LRE 144 may comparethe predefined ID parameter value corresponding to the reached macrocall to a value (ID1) of the first parameter (START) specified by theuser. If there is no match (decision 512, no branch), then LRE 144 may,at 510, continue executing application program 146. In response todetermining that the macro ID matches the ID specified by the user as astarting point for collecting performance metrics (ID1) (decision 512,yes branch), LRE 144 may capture performance metrics 514 for thestarting point. In an embodiment, capturing performance metrics 514 mayinclude recording the value of the hardware clock (using, for example,essential runtime services interface 212) as the starting point value.

Next, at 516, LRE 144 may continue executing code of application program146 until next macro call reached at 518. In response to arriving at agiven macro call (decision 518, yes branch), at 520, LRE 144 may comparethe predefined ID parameter value corresponding to the reached macrocall to a value (ID2) of the second parameter (END) specified by theuser. The reached macro call may not be the desired end point of thesegment of code that needs to be measured. Referring back to FIG. 4B, ifthe user specified starting and ending points as macro calls 456 and460, once LRE 144 reaches macro call 458 it will compare the IDparameter value corresponding to macro call 458 (‘0000000013’) to theuser specified value ID2 (in this example ‘0000000014’). In response todetermining that the macro ID does not match the ID specified by theuser as an ending point of the code segment for which performancemetrics are collected (decision 520, no branch), LRE 144, at 522 maycompare the predefined level parameter value corresponding to thereached macro call to a performance metrics collection criteriaspecified by the user as the LEVEL parameter value of the exemplary zDPScommand. It should be noted that the LEVEL parameter value may include acomparison operator in addition to a specific numeric value (forexample, “>5”). In response to determining that the level parametercorresponding to the reached macro call does not satisfy the userspecified criteria (decision 522, no branch), LRE 144 may return back to516. On the other hand, if the level parameter corresponding to thereached macro call does satisfy the user specified criteria (decision522, yes branch), at 524, LRE 144 may capture performance metrics forthis intermediate point within the segment of code execution of which isbeing measured. In an embodiment, capturing performance metrics mayinclude recording the value of the hardware clock as the valuecorresponding to this intermediate point. Subsequently to capturingperformance metrics at 524, LRE 144 may return back to 516.

In response to determining that the macro ID matches the ID specified bythe user as an ending point of the code segment for which performancemetrics are collected (decision 520, yes branch), LRE 144, at 526, maycapture performance metrics for the ending point and performcalculations based on the captured performance metrics. For example, inan embodiment, LRE 144 may simply subtract the starting point time valuefrom the ending point time value to determine an elapsed execution time.In various embodiments, performing calculations 526, may includedetermining elapsed execution time between any two of a plurality ofmacro calls for which the data was captured in accordance with the userspecified granularity criteria. It should be noted that LRE 144, atleast in one embodiment, may also determine CPU time consumption by anapplication program. In an embodiment, each of the plurality of macrocalls may include a block of code which may be executed by LRE 144 tomeasure and accumulate CPU time values.

At 528, LRE 144 may send the collected performance metrics along withthe calculation results to a display on a console. Alternatively, LRE144 may send the captured performance metrics along with the calculationresults, for example, to the file corresponding to the file nameprovided by the user in the DSN parameter. In an embodiment, thecaptured performance metrics may be stored in an XML (eXtensible MarkupLanguage) format. Once performance metrics are stored in one or morefiles, these files may be analyzed by a user, for example, via aGraphical User Interface (GUI), which may run in any operating system132, 142, 152 included in physical computer 110. Alternatively, the usermay analyze the captured performance metrics via a variety of softwaretools running, for example, on a remote computer connected to thephysical computer 110.

In summary, the method of performance metrics collection describedherein provides flexiblility for capturing performance metrics for avariety of computer programs using different levels of granularity.Advantageously, in accordance with an embodiment of the presentinvention, fine-grained performance metrics may be collected down to asingle line of source code (including single assembler instruction),without recompilation of the monitored computer application programshaving a plurality of macro calls inserted therein.

As will be appreciated by one skilled in the art, aspects of the presentinvention may be embodied as a system, method or computer programproduct. Accordingly, aspects of the present invention may take the formof an entirely hardware embodiment, an entirely software embodiment(including firmware, resident software, micro-code, etc.) or anembodiment combining software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module” or “system.”Furthermore, aspects of the present invention may take the form of acomputer program product embodied in one or more computer readablemedium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a computer readable storage medium. A computer readablestorage medium may be, for example, but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or semiconductor system,apparatus, or device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer readablestorage medium would include the following: an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk, arandom access memory (RAM), a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM or Flash memory), an optical fiber,a portable compact disc read-only memory (CD-ROM), an optical storagedevice, a magnetic storage device, or any suitable combination of theforegoing. In the context of this document, a computer readable storagemedium may be any tangible medium that can contain, or store a programfor use by or in connection with an instruction execution system,apparatus, or device.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, RF, etc., or any suitable combination ofthe foregoing.

Computer program code for carrying out operations for aspects of thepresent invention may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++ or the like and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on thelarge computer server, partly on the large computer server, as astand-alone software package, partly on the large computer server andpartly on a remote computer or entirely on the remote computer orserver. In the latter scenario, the remote computer may be connected tothe large computer server through any type of network, including a localarea network (LAN) or a wide area network (WAN), or the connection maybe made to an external computer (for example, through the Internet usingan Internet Service Provider).

Aspects of the present invention are described above with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer program instructions. These computer program instructions maybe provided to a processor of a general purpose computer, specialpurpose computer, or other programmable data processing apparatus toproduce a machine, such that the instructions, which execute via theprocessor of the computer or other programmable data processingapparatus, create means for implementing the functions/acts specified inthe flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer,other programmable data processing apparatus, or other devices to causea series of operational steps to be performed on the computer, otherprogrammable apparatus or other devices to produce a computerimplemented process such that the instructions which execute on thecomputer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

The descriptions of the various embodiments of the present inventionhave been presented for purposes of illustration, but are not intendedto be exhaustive or limited to the embodiments disclosed. Manymodifications and variations will be apparent to those of ordinary skillin the art without departing from the scope and spirit of the describedembodiments. The terminology used herein was chosen to best explain theprinciples of the embodiments, the practical application or technicalimprovement over technologies found in the marketplace, or to enableothers of ordinary skill in the art to understand the embodimentsdisclosed herein.

What is claimed is:
 1. A method for collecting one or more performancemetrics, the method comprising: providing a plurality of source codeinstructions having a plurality of macro calls among the plurality ofsource code instructions, wherein at least one of the plurality of macrocalls includes a plurality of predetermined parameters and wherein theat least one of the plurality of macro calls is assigned a desired levelof granularity; executing a plurality of object code instructionscorresponding to the plurality of source code instructions; receiving auser request to collect the one or more performance metrics, wherein theuser request specifies the predetermined parameters, and wherein thepredetermined parameters include a starting point, a ending point andthe desired level of granularity; determining whether the starting andending points matches a plurality of reached macro call identifiers ofleast one of the plurality of macro calls; comparing, if the startingand ending points does not match the plurality of reached macro callidentifiers, a predefined level parameter value with the desired levelof granularity of least one of the plurality of macro calls; andcollecting, the one or more performance metrics using the identified atleast one of the plurality of macro calls, if the predefined levelparameter value matches the desired level of granularity.
 2. The methodof claim 1, wherein the plurality of predetermined parameters uniquelyidentifies one of the plurality of macro calls.
 3. The method of claim1, wherein the identified at least one of the plurality of macro callsexecutes a macro program.
 4. The method of claim 1, wherein theplurality of macro calls includes a first macro call indicative of afirst source code location and a second macro call indicative of asecond source code location, wherein the identified at least one of theplurality of macro calls comprises the first macro call, and whereincollecting the one or more performance metrics comprises calculating avalue indicating an elapsed execution time of a portion of the pluralityof source code instructions, the portion located between the firstsource code location and the second source code location.
 5. The methodof claim 1, wherein the one or more performance metrics comprise atleast one of an elapsed execution time and CPU time spent duringexecution of a portion of the plurality of source code instructions. 6.The method of claim 1, wherein the desired level of granularity can beadjusted by a user dynamically using an operating system command.
 7. Themethod of claim 1, wherein collecting the one or more performancemetrics comprises generating an XML file.
 8. A computer program productfor collecting one or more performance metrics, the computer programproduct comprising one or more computer-readable tangible storagedevices, and a plurality of program instructions stored on at least oneof the one or more storage devices, the plurality of programinstructions comprising: program instructions to provide a plurality ofsource code instructions having a plurality of macro calls among theplurality of source code instructions, wherein at least one of theplurality of macro calls includes a plurality of predeterminedparameters and wherein the at least one of the plurality of macro callsis assigned a desired level of granularity; program instructions toexecute a plurality of object code instructions corresponding to theplurality of source code instructions; program instructions to receive auser request to collect the one or more performance metrics, wherein theuser request specifies the predetermined parameters, and wherein thepredetermined parameters include a starting point, a ending point andthe desired level of granularity; program instructions to determinewhether the starting and ending points matches a plurality of reachedmacro call identifiers of least one of the plurality of macro calls;program instructions to compare, if the starting and ending points doesnot match the plurality of reached macro call identifiers, a predefinedlevel parameter value with the desired level of granularity of least oneof the plurality of macro calls; and program instructions to collect,the one or more performance metrics using the identified at least one ofthe plurality of macro calls, if the predefined level parameter valuematches the desired level of granularity.
 9. The computer programproduct of claim 8, wherein the plurality of predetermined parametersuniquely identifies one of the plurality of macro calls.
 10. Thecomputer program product of claim 8, wherein the identified at least oneof the plurality of macro calls comprises program instructions toexecute a macro program.
 11. The computer program product of claim 8,wherein the plurality of macro calls includes a first macro callindicative of a first source code location and a second macro callindicative of a second source code location, wherein the identified atleast one of the plurality of macro calls comprises the first macrocall, and wherein the program instructions to collect the one or moreperformance metrics comprise program instructions to calculate a valueindicating an elapsed execution time of a portion of the plurality ofsource code instructions, the portion located between the first sourcecode location and the second source code location.
 12. The computerprogram product of claim 8, wherein the one or more performance metricscomprise at least one of an elapsed execution time and CPU time spentduring execution of a portion of the plurality of source codeinstructions.
 13. The computer program product of claim 8, wherein thedesired level of granularity can be adjusted by a user dynamically usingan operating system command.
 14. A computer system for collecting one ormore performance metrics, the computer system comprising one or moreprocessors, one or more computer-readable tangible storage devices, anda plurality of program instructions stored on at least one of the one ormore storage devices for execution by at least one of the one or moreprocessors, the plurality of program instructions comprising: programinstructions to provide a plurality of source code instructions having aplurality of macro calls among the plurality of source codeinstructions, wherein at least one of the plurality of macro callsincludes a plurality of predetermined parameters and wherein the atleast one of the plurality of macro calls is assigned a desired level ofgranularity; program instructions to execute a plurality of object codeinstructions corresponding to the plurality of source code instructions;program instructions to receive a user request to collect the one ormore performance metrics, wherein the user request specifies thepredetermined parameters, and wherein the predetermined parametersinclude a starting point, a ending point and the desired level ofgranularity; program instructions to determine whether the starting andending points matches a plurality of reached macro call identifiers ofleast one of the plurality of macro calls; program instructions tocompare, if the starting and ending points does not match the pluralityof reached macro call identifiers, a predefined level parameter valuewith the desired level of granularity of least one of the plurality ofmacro calls; and program instructions to collect, the one or moreperformance metrics using the identified at least one of the pluralityof macro calls, if the predefined level parameter value matches thedesired level of granularity.
 15. The computer system of claim 14,wherein the plurality of predetermined parameters uniquely identifiesone of the plurality of macro calls.
 16. The computer system of claim14, wherein the identified at least one of the plurality of macro callscomprises program instructions to execute a macro program.
 17. Thecomputer system of claim 14, wherein the plurality of macro callsincludes a first macro call indicative of a first source code locationand a second macro call indicative of a second source code location,wherein the identified at least one of the plurality of macro callscomprises the first macro call, and wherein the program instructions tocollect the one or more performance metrics comprise programinstructions to calculate a value indicating an elapsed execution timeof a portion of the plurality of source code instructions, the portionlocated between the first source code location and the second sourcecode location.